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Abstract 

This paper describes experiments 
carried out using a variety of 
machine-learning methods, includ¬ 
ing the k-nearest neighborhood 
method that was used in a pre¬ 
vious study, for the translation of 
tense, aspect, and modality. It was 
found that the support-vector ma¬ 
chine method was the most precise 
of all the methods tested. 


1 Introduction 

Tense, aspect, and modality are known to 
cause problems in machine translation. In 


This child always talks back to me, and this <v>is</v> 
why I hate him. 


d . 

I <v>did not think</v> he was so timid. 


Such a busy man as he <v>cannot have</v> any spare 
time. 

Figure 1: Part of the modality corpus 


We obtained better translation of tense, 
aspect, and modality by using a support- 
vector machine method than we could 
by using the k-nearest neighborhood 
method that we used in our previous 
study ([Murata et ah, 1999|) . 


traditional approaches tense, aspect, and 

modality have been translated using man- 

ually constructed heuristic rules. Recently, 
however, corpus-based approaches such as the 
example-based method (k-nearest neighbor¬ 
hood method) have also been used ( Murata 
et al., 1999|) . For our study, we carried 
out experiments on the translation of tense, 
aspect, and modality by using a variety of 
machine-learning methods, in addition to the 
k-nearest neighborhood method, and then de¬ 
termined which method was the most precise. 
In our previous research, in which we stud¬ 
ied the utilization of the k-nearest neighbor¬ 
hood method, only the strings at the ends 
of sentences were used to translate tense, as¬ 
pect, and modality. In this study, however, 
we used all of the morphemes from each of 
the sentences as information, as well as used 
the strings at the ends of the sentences. 

In connection with our approach, we would 
like to emphasize the following points: 


In our previous study ( Murata et ah, 


1999|) we only used the strings at the ends 
of sentences to translate tense, aspect, 
and modality. Here, however, we used all 
of the morphemes in each of the sentences 
as information, in addition to the strings 
at the ends of the sentences. Using a sta¬ 
tistical test, we were able to confirm that 
adding the morpheme information from 
each of the sentences was significantly ef¬ 
fective in improving the precision of the 
translations. 

2 Task Descriptions 

For this study we used the modality corpus 


described in one of our previous papers (|Mu- 


rata et al., 2001|) . Part of this modelity corpus 
is shown in Figure [I]. It consists of a Japanese- 
English bilingual corpus, and the main verb 
phrase in each English sentence is tagged with 
<v>. The symbols placed at the beginning 
of each Japanese sentence, such as “c” and 

















Table 1: Occurrence rates of categories 


Category 

Kodansha 

White paper 

present 

0.42 

0.41 

past 

0.36 

0.21 

imperative 

0.05 

0.00 

perfect 

0.04 

0.11 

“will” 

0.03 

0.06 

progressive 

0.03 

0.10 

“can” 

0.02 

0.04 

others 

0.05 

0.07 


“d,” indicate categories of tense, aspect, and 
modality for the sentence. (For example, “c” 
and “d” indicate “can” and past tense, respec¬ 
tively.) 

The following categories were used for 
tense, aspect, and modality. 

1. All combinations of each auxiliary verb (“be 
able to,” “be going to,” “can,” “have to,” 
“had better,” “may,” “must,” “need,” “ought,” 
“shall,” “used to,” and “will”) and forms for 
{present tense, past tense}, {progressive, non¬ 
progressive}, {perfect, non-perfect} (2 15 cate¬ 
gories) 

2. Imperative mood (1 category) 

These categories of tense, aspect, and 
modality are defined on the basis of the 
surface expressions of the English sentences. 
Therefore, if we are able to determine the cor¬ 
rect category from a Japanese sentence, we 
should also be able to translate the Japanese 
tense, aspect, and modality into English. In 
this study, only the tags indicating the cate¬ 
gories of tense, aspect, and modality and the 
Japanese sentences were used. 

The following two types of corpora were 
used to construct the modality corpus. 

• Example sentences in the Kodansha Japanese- 
English dictionary 

(39,660 sentences, 46 categories) 

• White papers 

(5,805 sentences, 30 categories) 

The occurrence rates of major categories 
are shown in Table |]. As can be seen in the ta¬ 
ble, the present tense occurs most frequently. 

3 Machine-Learning Methods 

We used the following four machine-learning 
method for our study.]] 

1 Although there are also decision-tree learning 
methods such as C4.5, we did not use them for the 


• k-nearest neighborhood method 

• decision-list method 

• maximum-entropy method 

• support-vector machine method 


In this next section, we will be explaining 
each of these machine-learning methods. 

3.1 k-nearest neighborhood method 

The domain of machine translation includes a 
method called an example-based method. In 
this method, the example most similar to the 
input sentence is searched for, and the cat¬ 
egory of the input sentence is chosen based 
on the example. However, this method only 
uses one example, so it is weak with respect to 
noise (i.e. errors in the corpus or other excep¬ 
tional phenomenon). The k-nearest neighbor¬ 
hood method prevents this problem by using 
the most similar examples (a total of k ex¬ 
amples) instead of using only the most sim¬ 
ilar example. The category is chosen on the 
basis of “voting”0 on k examples. Since this 
method uses multiple examples, it is capable 
of providing a stable solution even if a corpus 
includes noise. 

In the k-nearest neighborhood method, 
since it is necessary to collect similar exam¬ 
ples, it is also necessary to define the sim¬ 
ilarity between each pair of examples. The 
definition of similarity used in this paper is 
discussed in the section on features (Section 
0). When there is an example that has the 
same similarity as the selected k examples, 
that example is also used in the “voting.” 


3.2 Decision-List Method 


In this method, the probability of each cat¬ 
egory is calculated using one feature /,(£ 
F, 1 < j < k ), and the category with the 
highest probability is judged to be the cor¬ 
rect category. The probability that produces 


following two reasons. First, a decision-tree learning 
metho d perforins worse than other methods for severa l 
tasks (Murata et al., 200C; Taira and Haruno, 2000). 
Second, the number of attributes used in this study 
was too large, and the performance of C4.5 would be¬ 
come even worse if the number of attributes was de¬ 
creased so that it could be used. 

2 “Voting” means a decision made by the majority. 











a category a in a context b is given by the 
following equation: 


p(a\b) =p{a\fmax), (1) 


where fmax is defined as 


fmax — argmax fjeF max ai&A p(a-i\fj), (2) 

such that p(a t \fj) is the occurrence rate of 
category when the context includes a fea¬ 
ture fj. 

The decision-list method is simple. How¬ 
ever, since it estimates the category on the 
basis of only one feature, it is a poor machine¬ 
learning method. 


3.3 Maximum-Entropy Method 

In this method, the distribution of probabili¬ 
ties p(a, b) when Equation (|3j) is satisfied and 
Equation (||) is maximized is calculated, and 
the category with the maximum probability 
as calculated from the distribution of prob¬ 
abilities is judged to be the correct category 
( Ristad, 1997 ; |Ristad, 1998|) : 


^ P(a,b)gj{a,b) = ^ p(a,b)gj(a,b) (3) 

aeA,beB aeA,beB 

for Mfj (1 < j < k) 


H (p) = - p(a, b) log {p(a, b)) , (4) 

aeA,beB 

where A,B, and F are a set of categories, 
a set of contexts, and a set of features fj (£ 
F, 1 < j < k), respectively; gj(a,b) is a func¬ 
tion with a value of 1 when context b includes 
feature fj and the category is a, and 0 oth¬ 
erwise; and p(a, b ) is the occurrence rate of a 
pair (a, b ) in the training data. 

In general, the distribution of p(a, b) is very 
sparse. We cannot use the distribution di¬ 
rectly, so we must estimate the true distribu¬ 
tion of p(a, b) from the distribution of p(a, b). 
With the maximum-entropy method, we as¬ 
sume that the estimated value of the fre¬ 
quency of each pair of categories and features 
calculated from p(a , b) is the same as that 
calculated from p(a,b), which corresponds to 
Equation |||. These estimated values are not so 
sparse. We can thus use the above assumption 


Figure 2: Maximizing the margin 


to calculate p(a,b). Furthermore, we maxi¬ 
mize the entropy of the distribution of p(a, b) 
to obtain one solution for p(a, b), because us¬ 
ing only Equation || produces several solutions 
for p{a, b ). Maximizing the entropy makes the 
distribution more uniform, which is known to 
provide a strong solution for data sparseness 
problems. 


3.4 Support-Vector Machine Method 


In this method, data consisting of two cat¬ 
egories is classified by dividing space with a 
hyperplane. When the two categories are pos¬ 
itive and negative and the margin between the 
positive and negative examples in the train¬ 
ing data is larger (see Figure |2f|), the possibil¬ 
ity of incorrectly choosing categories in open 
data is thought to be smaller. The hyperplane 
that maximizes the margin is determined, and 
classification is carried out by uisng the hyper¬ 
plane. Although the basics of this method are 
as described above, for the extended versions 
of the method the inner region of the mar¬ 
gin in the training data can include a small 
number of examples, and the linearity of the 
hyperplane can be changed to non-linearity 
by using kernel functions. Classification in 
the extended method is equivalent to classifi¬ 
cation carried out using the following discern¬ 
ment function, and the two categories can be 
classified on the basis of whether the output 
value of the function is positive or negative 
flCristianini and Shawe-Taylor, 2000; Kudoh 

ip: ~ 


3 In the figure, the white and black circles indi¬ 
cate positive and negative examples, respectively. The 
solid line indicates the hyperplane that divides the 
space and the broken lines indicate the planes at the 
boundaries of the margin regions. 

















^a i y i A'(x i ,x) + fe (5) 
i =i / 

maxi, yi= —i b-[ — l - mini, yi= \bi 
= 2 
i 

hi = 'ZcmKto,*), 

3 =1 

where x is the context (a set of features) of 
an input example, x, and y* (i = 1,j/j E 
{1,-1}) indicate the context of the training 
data and its category, and the function sgn is 

sgn(x) = 1 (x > 0), (6) 

—1 ( otherwise ). 

Each OLi (i = 1,2...) is fixed as the value of a* 
when the value of L(a) in Equation (J) is at 
its maximum under the conditions of Equa¬ 
tions ® and (!)• 

i 1 1 

L {a) = ~ 7 ; ^ ( 7 ) 

i= 1 i,j = l 

o < Oi < C (i = 1,...,/) (8) 


i 

y, OLiVi = 0 (9) 

2=1 

Although the function A” is called a kernel 
function and various types of kernel functions 
are used, we used the following polynomial 
function: 

A(x, y) = (x ■ y + l) d . (10) 


In this method, for data consisting of N cat¬ 
egories, pairs of two different categories (N(N- 
l)/2 pairs) are constructed. The better cate¬ 
gory is determined using a 2-category classi¬ 
fier. In this paper, a support-vector machine^ 
was used as the 2-category classifier. Finally, 
the correct category is determined on the ba¬ 
sis of “voting” on the N(N-l)/2 pairs analyzed 
by the 2-category classifier. 

The support-vector machine method used 
in this paper was performed by combining the 
support-vector machine method and the pair¬ 
wise method described above. 

4 Features (information used in 
classification) 

Although we have already explained the four 
machine-learning methods in the previous sec¬ 
tion, we must also define the features (infor¬ 
mation used in classification). In this section, 
we will explain these features. 

As mentioned in Section |^, when a Japanese 
sentence is input, we then output the category 
of the tense, aspect, and modality. There¬ 
fore, features are extracted from the input 
Japanese sentence. 

We tested the following three kinds of fea¬ 
ture sets in our experiments. 

• Feature-set 1 

Feature-set 1 consists of 1-gram to 10-gram 
strings at the ends of the input Japanese sen¬ 
tences and all of the morphemes from each of 
the sentences. 

e.g. (do not), (today). 

(The number of features is 230,134 in the Ko- 
dansha Japanese-English dictionary and 25,958 
in the white papers.) 


C and d are constants set by experimenta¬ 
tion, and in this paper, C is fixed as 1 for all 
of the experiments. Two values, d = 1 and 
d = 2, are used for d. A set of x* that sat¬ 
isfies ctj > 0 is called a support vector, and 
the portion performing the sum in Equation 
(IH) is calculated using only examples that are 
support vectors. 

Support-vector machine methods are capa¬ 
ble of handling data consisting of two cate¬ 
gories. In general, data consisting of more 
than two categories is handled using the pair¬ 
wise method QKudoh and Matsumoto, 200C| ). 


• Feature-set 2 

Feature-set 2 consists of 1-gram to 10-gram 
strings at the ends of the input Japanese sen¬ 
tences. 

e.g. (do not), (did not). 

(The number of features is 199,199 in the Ko- 
dansha Japanese-English dictionary and 16,610 
in the white papers.) 


• Feature-set 3 


Feature-set 3 consists of all of the morphemes 
from each of the sentences. 


4 We used the software TinySVM ( Kudoh, 200C ) by 
Kudo as a support-vector machine. 







Table 2: Precisions for the Kodansha data (values in parentheses are for the closed experiments) 


Method 

Feature-set 1 

Feature-set 2 

Feature-set 3 

knn (k=l) 

— 

— 

79.36% 

(98.50%) 

— 

— 

knn (k=3) 

— 

— 

80.35% 

(83.94%) 

— 

— 

knn (k=5) 

— 

— 

80.43% 

(82.39%) 

— 

— 

knn (k=7) 

— 

— 

80.39% 

(81.71%) 

— 

— 

knn (k=9) 

— 

— 

80.22% 

(81.30%) 

— 

— 

decision list 

74.19% 

(98.21%) 

80.23% 

(98.18%) 

67.90% 

(86.58%) 

max. ent. 

80.37% 

(88.87%) 

81.16% 

(83.85%) 

75.35% 

(84.15%) 

support vec. (d=l) 

82.48% 

(98.70%) 

81.93% 

(98.50%) 

78.68% 

(96.68%) 

support vec. (d=2) 

82.28% 

(98.48%) 

81.37% 

(98.48%) 

79.01% 

(98.74%) 


baseline = 73.88%. 


e.g. (today), (I), (topic-marker particle) 
(run). 

(The number of features is 30,935 in the Kodan¬ 
sha Japanese-English dictionary and 9,348 in the 
white papers.) 


The Japanese morphological analyzer JU- 
MAN (Kurohashi and Nagao, 1998) was used 
to divide the input sentences into morphemes. 

Feature-set 1 is the combination of Feature- 
sets 2 and 3. Feature-set 2 was constructed 


based on our previous research ( Murata et al., 


1999f) . In Japanese sentences the tense, as¬ 
pect, and modality are often indicated by the 
verbs at the ends of sentences. [] Therefore, in 
our previous study, the strings at the ends of 
the sentences were used as features. Feature- 
set 3 was constructed by taking into consid¬ 
eration the fact that adverbs such as “tomor¬ 
row” and “yesterday” can also indicate tense, 
aspect, and modality, and must therefore be 
used. 

Defining the feature sets is sufficient for 
enabling the use of decision-list, maximum- 
entropy, and support-vector machine meth¬ 
ods. For the k-nearest neighborhood method, 
however, it is also necessary to define the sim¬ 
ilarities between examples, in addition to the 
feature sets. For Feature-sets 1 and 3, which 
use all of the morphemes from the entire input 
sentence, it is difficult to define the similarity. 
Therefore, we decided to only use Feature-set 
2 for the k-nearest neighborhood method. In 
terms of defining similarity for Feature-set 2, 
when two examples match for x-grarn charac¬ 
ters, the value of the similarity between them 
is x. 


s The Japanese language is of the type SOV, so verl 
phrases appear at the ends of sentences. 


5 Experiments 

This section describes our experiments on 
the translation of tense, aspect, and modal¬ 
ity that were conducted using the machine¬ 
learning methods described in Section || with 
the feature sets described in Section [| for the 
tasks described in Section |j. 

First, we conducted experiments using the 
example sentences in the Kodansha Japanese- 
English dictionary. The results for these ex¬ 
periments are shown in Table ||. We con¬ 
ducted two types of experiments, closed and 
openPL The open experiments were performed 
using 10-fold cross-validation. In the table, 
the values in parentheses indicate the preci¬ 
sions for the closed experiments and the val¬ 
ues outside the parentheses are for the preci¬ 
sions for the open experiments. (We used the 
baseline method for comparison. This method 
is used to judge cases in which the end of the 
sentence is which is a Japanese particle 
used for the past tense, as the past tense, and 
judges other cases to be the present tense.) 

We were able to learn the following from 
the experimental results. 


• The cases of k > 1 performed better than 
the case of k = 1 , which is the example- 
based method. We thus found that the k- 
nearest neighborhood method was more 
precise than the example-based method. 
(This had also been confirmed in our pre¬ 
vious research (Murata et ah, 1999|).) 


• The decision-list method had almost the 
same precisions as the k-nearest neigh- 

6 Closed, means experiments that uses the tested 
data when learning. Open means experiments that do 
not use the tested data when learning. 















borhood method when Feature-set 2 was 
used. 

The maximum-entropy method was more 
precise than both the k-nearest neighbor¬ 
hood and decision-list methods. 

The support-vector machine method ob¬ 
tained higher precisions than all the other 
methods. 

In terms of comparing feature sets, 
the maximum-entropy and decision-list 
methods obtained their highest preci¬ 
sions when Feature-set 2 was used. They 
produced lower precisions when mor¬ 
pheme information was added, as in 
Feature-set 1. This would be because the 
number of unnecessary features increases 
as the total number of features increases. 

In terms of comparing feature sets 
for the support-vector machine method, 
Feature-set 1 obtained the highest pre¬ 
cisions. This indicates that adding the 
morpheme information was effective in 
improving precision. Since adding the 
morpheme information produced lower 
precisions for the other methods, we as¬ 
sume that the support-vector machine 
method is more capable of eliminating 
unnecessary features and selecting effec¬ 
tive features than the other methods. We 
can provide the following explanations 
for these results based on the theoretical 
aspects of each of the methods. 

— Because the decision-list method chooses 
the category by using only one feature, it 
is likely that only one unnecessary feature 
will be used, and that the precisions are 
likely to decrease when there are many un¬ 
necessary features. 

— Because the maximum-entropy method al¬ 
ways uses almost all of the features, the 
precisions decrease when there are many 
unnecessary features. 

— However, because the support-vector ma¬ 
chine includes a function that eliminates 
examples, such that it only uses examples 
that are support vectors and does not use 
any other examples, it eliminates many un¬ 
necessary features along with these exam¬ 
ples. The precisions are thus unlikely to de¬ 
crease even if there are many unnecessary 
features. 


Table 3: Effective morpheme features 


Frequency 

Morpheme feature 

221 

(subject-case particle) 

121 

(is doing) 

89 

(is not) 

28 

(do) 

23 

(if) 

23 

(came) 

22 

(already or yet) 

19 

(between) 

18 

(recently) 

16 

(will) 

16 

(did) 

14 

(yet) 

13 

(can) 

12 

(if not) 

11 

(let’s) 

11 

(entirely) 

10 

(is done) 

8 

(tomorrow) 

8 

(how) 


• The method with the higest precision 
among all the methods was the support- 
vector machine method using d = 1 and 
Feature-set 1. 

To confirm that using Feature-set 1 was 
better than using Feature-set 2, (in other 
words, to confirm that adding the morpheme 
information was effective) we conducted a sign 
test. This was done for the case of d = 1, 
which produced better results than d = 2. 
Among all of the 39,660 examples, the number 
for which the category was chosen incorrectly 
with Feature-set 1 and correctly with Feature- 
set 2 was 648. For the opposite case (chosen 
incorrectly with Feature-set 2 and correctly 
with Feature-set 1), the number was 427. We 
performed the sign test by using this statisti¬ 
cal data and obtained results showing that a 
significant difference existed at a significance 
level of 1%. We can thus be almost completely 
sure that adding the morpheme information 
was effective. 

Next, we examined which features were ef¬ 
fective among all the features using the mor¬ 
pheme information. This was done by exam¬ 
ining the features that appeared relatively fre¬ 
quently among the 648 examples for which the 
category was chosen incorrectly with Feature- 
set 1 and correctly with Feature-set 2. We 
used a binomial test to choose an example 
whose occurrence rate among the 648 ex- 



Table 4: Precisions for the white papers (values in parentheses are for the closed experiments) 


Method 

Feature-set 1 

Feature-set 2 

Feature-set 3 

Support Vec. (d=l) 
Support Vec. (d=2) 

60.10% (99.81%) 

64.67% (99.81%) 

56.61% (89.87%) 

56.74% (89.87%) 

56.14% (96.67%) 

62.07% (99.83%) 


baseline = 49.77%. 


Table 5: Precisions for the Kodansha data and the white papers (values in parentheses are for 
the closed experiments) 


Training data 

Test data 

Kodansha and White 

Kodansha only 

White papers only 

Kodansha data (d=l) 
Kodansha data (d=2) 
White papers (d=l) 
White papers (d=2) 

82.44% (98.71%) 

82.31% (98.74%) 

60.02% (99.79%) 

64.01% (99.83%) 

82.48% (98.70%) 

82.28% (98.48%) 

47.65% 

49.53% 

65.31% 

51.92% 

60.10% (99.81%) 

64.67% (99.81%) 


arnples was significantly larger than among 
all the examples at a significant level of 1% 
as a relatively frequently appearing example. 
The 20 most frequently occurring features are 
listed in Table As shown in the table, we 
were able to obtain features that are thought 
to be effective in determining tense, aspect, 
and modality, such as (already), (re¬ 
cently), (will), (yet), (if not), 
(let’s) and (tomorrow), and we believe that 
such features improved precisions. 

Next, we carried out experiments using 
the white papers. These experiments were 
performed using the support-vector machine 
method that produced good precisions for the 
Kodansha data. The open precisions were cal¬ 
culated using 10-fold cross-validation in these 
experiments as well. The experimental results 
are shown in Table [|. 

We learned the following from the results. 


was used as the test data.[] We then examined 
how the precisions changed under these con¬ 
ditions. These experiments were performed 
using support-vector machine methods with 
d=l or d=2, which both obtained good pre¬ 
cisions. When the training data and the test 
data overlapped, 10-fold cross-validation was 
used for the overlapping part. The experi¬ 
mental results are shown in Table [Bj. 

We learned the following from these results. 

• When we used different-domain data as 
training data, the precisions greatly de¬ 
creased. When the Kodansha data was 
analyzed using the White paper data as 
training data or the white paper data 
was analyzed using the Kodansha data 
as training data, the precisions decreased 
about 15% (82.48% 65.31% or 64.67% 

=> 49.53%). 


• The highest precision for the white paper 
data was 64.67%. 

• Feature-set 1 produced higher precisions 
than Feature-set 2. Moreover, Feature- 
set 3 also produced higher precisions than 
Feature-set 2. These results again con¬ 
firmed that adding the morpheme infor¬ 
mation for each of the sentences was ef¬ 
fective in improving precisions. 

We next performed experiments in which 
different-domain data was used as training 
data, such that the Kodansha data was used 
as the training data and the white paper data 


We thus found that using same-domain 
data is more effective in terms of preci¬ 
sion. It is difficult to construct a system 
adapted for different-domain data with 
a method that uses hand-written rules. 
However, for methods using machine¬ 
learning, such as those described in this 
paper, since it is easy to change the train¬ 
ing data to different-domain data and 
then have the data learned again, it is 
easy to construct a system adapted for 
different-domain data. 


7 Sekine had carried out domain-den cndent / 


domain-independent experiments on parsing (lekine 
1997|). 
















• When both the Kodansha and the white 
paper data were used as training data, 
the precisions were almost the same or 
slightly decreased. We thus found in¬ 
creasing the size of training data is not al¬ 
ways better and adding different-domain 
data is not effective. 


6 Conclusion 


Tense, aspect, and modality are known to 
present difficult problems in machine transla¬ 
tion. In traditional approaches, tense, aspect, 
and modality have been translated using man¬ 
ually constructed heuristic rules. Recently, 
however, corpus-based approaches such as the 
example-based method (k-nearest neighbor¬ 
hood method) have also been applied (Murata 
et al., 1999| ). We carried out experiments on 
the translation of tense, aspect, and modality 
by using a variety of machine-learning meth¬ 
ods, as well as the k-nearest neighborhood 
method, and we determined which method 
was the most precise. In our previous re¬ 
search, in which we used the k-nearest neigh¬ 
borhood method, only the strings at the ends 
of sentences were used to translate tense, as¬ 
pect, and modality. However, in this study we 
used all of the morphemes in each of the sen¬ 
tences as information, as well as the strings at 
the ends of each of the sentences. 

The support-vector machine method was 
found to produce the highest precisions of all 
the methods we tested. We were also able 
to obtained better translations of tense, as¬ 
pect and modality than we could by using 
the k-nearest neighborhood method. Further¬ 
more, we used a statistical test to confirm that 
adding the morpheme information for the en¬ 
tire sentence, which was not used in our pre¬ 
vious study, was effective in improving preci¬ 
sion. We also carried out experiments using 
a different-domain corpus. In these experi¬ 
ments, we confirmed that using a different- 
domain corpus as the training data produced 
very low precisions, and that we must con¬ 
struct a system for translating the tense, as¬ 
pect, and modality for each domain. This 
also indicates that approaches using machine¬ 
learning methods, such as those described in 
this paper, are appropriate because it would 


be too difficult to construct systems adapted 
for different domains by hand. 
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